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Abstract. - We introduce an algorithm which estimates the number of circuits in a graph 
as a function of their length. This approach provides analytical results for the typical entropy 
of circuits in sparse random graphs. When applied to real- world networks, it allows to esti- 
mate exponentially large numbers of circuits in polynomial time. We illustrate the method by 
studying a graph of the Internet structure. 



Introduction. An increasing amount of data has been collected on the topology of real- 
world networks appearing in many different contexts, the Internet being only one of many 
examples [1]. A natural line of research in this field consists in identifying characteristic 
features of the networks, to compare them with theoretical models and potentially disprove 
the latter. The simplest of these properties is the distribution of vertex degrees, which has 
been repeatedly argued to exhibit power-law tails. Quite generally, it is computationally easy 
to measure the 'local' (involving a vertex and a finite number of neighbors) properties of a 
given network; for instance loops with less than 5 edges in the Internet graph have been studied 
in [2]. However, it might well be that the most distinctive features of real- world networks are 
'global' {i.e. that they depend on an extensive portion of the graph): their measure becomes 
then a very challenging numerical problem. 

Among these global properties, we shall consider in the present work the number of long 
circuits in a graph, i. e. of circuits visiting a finite fraction of the vertices [3] . These circuits 
are roughly exponentially numerous in the size of the graph: because of that the use of exact 
algorithms [4], whose complexity is linear in the number of cycles to be enumerated, is possible 
for small sizes only. Another trace of this difficulty lies in the NP-completeness of the decision 
problem of knowing if a graph is Hamiltonian (i.e. if it contains a loop visiting all vertices) [5]. 
To overcome this difficulty it is reasonable to look for approximate algorithms with running 
times scaling polynomially with the network size. A formal result in this direction is the 
existence of a probabilistic algorithm for the approximate counting of Hamiltonian cycles in 
graphs with large minimal connectivity [6]. Very recently a sampling method based on a 
Monte Carlo Markov Chain has been proposed [7] . 
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In this letter we introduce a counting algorithm, relying on a statistical mechanics approach 
expanding on the results of [8] (see also [9]), the details of which shall be exposed in a longer 
publication [10]. The algorithm is first applied to a real- world network, then to random 
graphs. In the latter context the number of circuits of a given length is a random variable, 
whose properties have been thoroughly studied by Garmo [11] in the regular case (all vertices 
have the same degree, see also [12] for a review). For the Erdos-Renyi ensemble (in which 
vertex degrees are Poisson distributed) , most results have been obtained in the vicinity of the 
percolation transition, or on the contrary for very large average degrees [13]. The calculation 
of the expected number of circuits in ensembles with arbitrary degree distributions has been 
recently performed by Bianconi and Marsili [14]. However these expectations turn out to be 
dominated by atypical (exponentially rare) graphs with exponentially more circuits than the 
typical ones. 

Definitions and algorithm. Let G = (V, E) be a graph, where V and E are the sets of 
vertices and edges respectively. The size of G is the number of vertices, N = \V\. A circuit 
C = (Vc,Ec) is a closed path on the graph visiting each vertex at most once. The length L 
of the circuit is the number of visited vertices or edges, L = \Vc\ = \Ec\- A circuit visiting all 
vertices (L = N) is called Hamiltonian. Our scope is to count the number of distinct circuits 
of a given graph G, Nl(G), as a function of their length L; more precisely, we define below 
a procedure estimating the entropy a(£) = (lnA/L)/-/V of circuits of length N £. The reduced 
length I is an intensive parameter in [0, 1]. For i S V, we call di the set of neighbors of the 
vertex i, and use the symbol \ to subtract an element of a set: if j is a neighbor of i, di\j will 
be the set of all neighbors of i distinct from j. We denote by i — > j,j — > i the two oriented 
edges that can be built from (ij) 6 E: see Fig. Q] for an illustration of these definitions. 

The basic idea of our approach is to introduce a probability law p(C] G, u) = v) Ec \ /Z(G, u) 
over the set of circuits of G. Hence the normalization factor Z(G, u) — ^ c m' Bc I is equal to the 
generating function of the number of circuits, J^l A/l(G)m l . In the limit of large graphs and 

circuit lengths, the saddle-point method leads to the relation — In Z(G, u) = max[a(£) + £ In u]. 

This relation can be inverted with standard Legendre transformations, and the entropy a can 
be expressed in terms of the partition function Z . An estimate of Z can then be obtained 
by using the Bethe approximation of the corresponding statistical mechanics model, or by 
means of Monte Carlo simulation [7]. Following the former road, and using the well known 
correspondence between minimization of the Bethe free-energy and iterations of the Belief 
Propagation equations [15], one is lead to the following algorithm (see [10] for details): 



Circuit Counting Algorithm 
Input: a graph G — (V, E), u a real positive number. 
Operation: iterate the set of 2\E\ recursive equations 

U E VmLi 
,.(T+i) _ medi\j 

^ i + M y v {T) v (T) ' 

from a randomly chosen initial condition yl°lj > until it converges (within some a priori 
accuracy) to a fixed point y*^AG,u) . 
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Output: estimate of the entropy a{£) of circuits of length N £ with 

1 = M E n,- ' 



Pan 



*<o - ^E ln ^r 2 E 
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The procedure has to be repeated with different values of u to reconstruct a parametric 
plot of a{£) for £ £ [0;£ max ]. For small values of u, the iteration equations converge to the 
trivial solution, y* = for all edges. The minimal value of u yielding a non trivial solution, 
uq, is related to the slope of the entropy at the origin, da/d£\o = — lnuo- 

The algorithm runs in time growing polynomially with the graph size and logarithmically 
with the required accuracy on the fixed-point solution. For generic graphs, one cannot warrant 
neither the convergence of the iteration, nor the validity of the Bcthe approximation (see [16] 
for the computation of corrections). This approximation is however expected to be correct for 
large random graphs, and should be reasonable for most real-world networks. 

Besides the global information a{£), the algorithm gives a local description of the circuits 
of the graph, through the quantities y* , called messages hereafter. These have indeed the 
following interpretation: for {ij) G E, V{ij) defined in Eq. is the probability that the edge 
is present in a circuit C drawn from the distribution p(C; G, u), i.e. the fraction of the circuits 
of length I which go through {ij). 

Note that in Eq. JJJ we used the convention that sums on empty sets are null. In particular, 
yi—>j = if i is the only neighbor of j, in other words if i is a leaf of the graph. Moreover 
if all incoming messages on a edge are vanishing, the outgoing message is also null. This 
simple remark implies that edges (ij) with at least one of their fixed-point messages y*_>j, 
Vj—ti vanishing are exactly the ones which would be erased in the leaf removal procedure to 
compute the 2-core (maximal subgraph in which all vertices have degree at least 2) of the 
graph [17]. This property could be expected: by definition, no circuit can be drawn outside 
of the 2-core. 

Application to real-world graphs: approximate counting. As an illustrative example, we 
present in Fig. [2] the output of the algorithm when applied to the graph of the Internet struc- 
ture at the Autonomous Level System, using preliminary data from the DIMES measurement 
project [18]. The original graph contained N = 14291 vertices and M — 33666 edges. For 
simplicity we plot the results in units of its 2-core size, A corc = 9694 (the 2-core contains 
•Mcore = 29069 edges). Two features of this entropy curve can be underlined. According to 
our algorithm, the most numerous circuits contain 1555 edges, and there are around 10 729 
(certainly out of reach of any direct enumeration) of such circuits; the longest ones contain 
L max w 2710 edges. The agreement between exact enumerations for short circuits of length 
L = 3, 4, 5 (enumerating longer ones becomes excessively costly) and the results of the algo- 
rithm is quantitatively decent (see inset of Fig. |2J . A rough analysis of the local information 
provided by the algorithm shows that high degree vertices belongs generally to a higher frac- 
tion of circuits than poorly connected ones; there are however strong fluctuations around this 
general trend. 

Application to random graphs: analytical results. Consider now random graph ensembles 
with fixed degree distribution, and call the fraction of vertices having degree k. We assume 
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Fig. 1 - A vertex i, and its neighbors j and k £ di\j. Each oriented edge carries a message y involved 
in Eq. Q. 



that qk decays fast enough for large connectivities, so that all its moments are well defined. 
Let us introduce the average degree c, and the probability that the end-vertex of a randomly 
chosen edge has degree k + 1, qk = {k + l)qk+i/c. The typical (quenched) entropy density 
is defined by a q (£) = In Ni n (G)/N, where the over-line denotes an average over the random 
graph ensemble. In contrast the computation of [14] yields the annealed entropy a a (£) = 
hiAf tN (G)/N. 

Running the algorithm defined above on a graph of the ensemble leads to a random (with 
respect to the choice of the graph) set of messages y* . The assumptions of the so-called 
cavity method at the replica-symmetric level [19] lead to a self-consistent equation for the 




Fig. 2 - Solid line: the number of circuits in a graph of the Internet. Inset: magnification of the 
small length results, the symbols have been obtained by exhaustive enumeration of the circuits of 
length L = 3, 4, 5. Dashed line: typical entropy in the random graph ensemble of same connectivity 
distribution. 
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Fig. 3 - Typical entropy for Poissonian graphs with c = 2, cavity computation (solid line) and 
exhaustive enumeration determination of the median number of cycles in samples of 200.000 graphs 
of size N = 60 (dashed) and N = 116 (dotted). Symbols obtained by an extrapolation of the numerical 
results for several values of N to the limit N — > oo. 

distribution P of the messages y found on a randomly chosen directed edge : 



where g(a, b, u) = uaj (1 + u 2 (a 2 — b)/2) (see Eq. Q). From the solution of this distributional 
equation, easily found numerically by means of a population dynamics algorithm [20] , one can 
use Eq. to compute the typical entropy of the graphs of the ensemble, cr q (£), parametrized 
by u. We present in Fig. the results of this approach on Poissonian graphs (i.e. qk = 
e~°c r/kl) with mean degree c = 2, along with a confirmation by exhaustive enumeration on 
finite size samples. 

An analytic resolution of Eq. is possible only for the very particular case of random 
regular graph for which all qk but one vanish. We find that P reduces to a single Dirac 
distribution in y* (c) solution of y* — g({c— l)y* , (c— l)(y*) 2 , u) . In that case the fluctuations 
of Ml are sufficiently small for the annealed and quenched averages to coincide and the result 
obtained rigorously in [11] is found back [8]. 

Some analytical predictions can be made when the random graphs are not purely regular, 
even if P is not known explicitly. First of all, the fraction Q of strictly vanishing messages is 
found to be the smallest root in [0; 1] of ( = J2k>o * C k> Following the above interpretation 
of the null messages, the fraction of edges that belong to the 2-core is (1 — £) 2 . Moreover its 
connectivity distribution can also be expressed from £ and qk, and these predictions checked 
from the solution of the differential equations describing the leaf removal algorithm [10,17]. 

One can also set up a systematic expansion of <r q around I = 0. To state the results in a 
compact way, let us define the factorial moments of qk as p, n — > qj- k(k — 1) . . . (k — n + 1). 




k 



x 






k~>n 

The coefficients of the second order expansion of the entropy read: 
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Comparing this expansion with the annealed computation of [14], one finds that the first 
derivatives are equal in both computations, and match the known results for circuits of finite 
size. However the second derivatives turn out to be different, 

-(M2-Mi(Mi-l)) 2 ■ (5) 



dP 



d( 2 



c£i(£i - !)' 

ft is straightforward to show from Eq. (jSJ) that the expansion of the annealed and quenched 
entropies coincide only if the distribution q^ is supported by a single integer, in other words 
in the random regular graph case. 

Another limit that can be investigated analytically is the one of maximal length circuits 
f ma x, reached here when u gets large. We need to distinguish two cases: if the connectiv- 
ity distribution is supported on the integers larger than 3, the cavity computation predicts 
(■■max = 1, arid the graphs in such ensembles are typically Hamiltonian. Interestingly, this 
was conjectured by Wormald in [12]. Even if the present statistical mechanics approach does 
not provide a rigorous proof of the conjecture, it allows to make it quantitative (with the 
prediction of the typical entropy of such Hamiltonian circuits, cr q (l)). Moreover it gives a 
hint at why usual probabilistic methods are not powerful enough to prove the conjecture (the 
quenched entropy is strictly smaller than the annealed one in general). Note that this property 
concerning Hamiltonian circuits crucially relies on the fast decay of the degree distribution: 
it was shown in [14] that it can be invalidated when has power law tails. 

As soon as the connectivity distribution of the 2-core contains a finite fraction of sites 
of degree 2, it cannot be Hamiltonian. Consider indeed a vertex of degree k, surrounded 
by k! neighbors of degree 2, with k > k' > 3: it is obvious that no circuit can visit more 
than two of these k! sites. As the number of such forbidden vertices is extensive, one has 
£max/-^corc < 1- The quantity £ max can be computed by taking analytically the appropriate 
limit in the equation Q on P, resulting in a simpler distributional equation which can be 
solved analytically in the limit of infinitesimal fraction of degree 2 sites [10]. 

Discussion and Conclusion. We mentioned in the introduction the possibility of using 
global properties of graphs to test the relevance of random graph ensembles for the descrip- 
tion of real world networks. Following this idea, we compared the circuit entropy of the 
DIMES Internet graph with the quenched result for the ensemble with the same connectivity 
distribution (dashed line in Fig. |2J). They turn out to be rather different, suggesting that 
random graph ensembles defined only through their connectivity distribution are not a very 
precise description of real world networks. It would thus be interesting to extend our analyt- 
ical study to different ensembles of graphs, for instance introducing correlations between the 
degrees of neighboring vertices [21], or considering growing models of networks [22]. 

Concerning the application of the algorithm to individual graphs, two questions should 
be further investigated: can one give general conditions [23, 24] on the graphs which ensure 
the convergence of the BP equations? Can they be sharpened to show that the output of the 
algorithm is a rigorous lower bound on the true number of circuits? Tests on various types of 
graphs show that the iteration procedure is generally very robust against the initial condition 
on the y, and converges to a unique fixed-point. Small counter-examples on which the BP 
equations do not converge can however be easily tailored. 

Large deviations of Nl around its typical value e Na "^ could also be an interesting object 
of study, using for instance the modified cavity method of [25] . One may in particular seek the 
exponentially small probability that a randomly drawn graph is not Hamiltonian for ensembles 
whose typical instances are. 

Finally, we hope that our algorithm will be useful for analyzing graphs data available in 
various contexts besides the Internet e.g. regulatory and more generally biological interaction 
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networks. The huge size of these data sets make the use of exact analysis procedures impos- 
sible, not-to-say unnecessary when data are plagued by false positives and/or negatives as is 
often the case in biological experiments e.g. DNA chips. Approximate and fast algorithms 
may then reveal adequate. 
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